\documentclass[12pt]{article}
\usepackage{titlesec}
\usepackage{graphicx,setspace,hyperref,amsmath,amsfonts,times,multirow,ccaption,tabularx,verbatim,booktabs,mdwlist}
\usepackage[top=1in, bottom=1in, left=1in, right=1in]{geometry}
\usepackage{natbib}
\bibpunct{(}{)}{;}{a}{}{,} %set in-line reference punctuation
\setlength{\bibsep}{0.1in} %set spacing between references
\setlength{\bibhang}{.1in} %set hanging indent for references
\renewcommand\bibsection{\section*{\refname}} %Changes "Bibliography" to "References"
\expandafter\def\expandafter\quote\expandafter{\quote\singlespacing} %single space quote environment
\captionnamefont{\bfseries}
\captiontitlefont{\bfseries}
\newcommand{\tablenote}[1]{\singlespacing\footnotesize #1}
\renewcommand{\labelitemi}{--}
\renewcommand{\abstractname}{\vspace{-\baselineskip}}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\hypersetup{
    bookmarks=true,         % show bookmarks bar?
    unicode=false,          % non-Latin characters in Acrobat's bookmarks
    pdftoolbar=true,        % show Acrobat's toolbar?
    pdfmenubar=true,        % show Acrobat's menu?
    pdffitwindow=false,     % window fit to page when opened
    pdfstartview={FitH},    % fits the width of the page to the window
    pdftitle={How Does Treatment Self-Selection Affect Inferences About Political Communication?},    % title
    pdfauthor={Thomas J. Leeper},     % author
    pdfsubject={Political Science},   % subject of the document
    pdfkeywords={politics} {public opinion} {information acquisition} {engagement} {motivated reasoning}, % list of keywords
    pdfnewwindow=true,      % links in new window
    pdfborder={0 0 0}
}
\usepackage{tikz}
\usetikzlibrary{shapes,arrows,positioning,decorations.pathreplacing}

\title{How Does Treatment Self-Selection Affect Inferences About Political Communication?}
\author{Thomas J. Leeper\thanks{An earlier version of this paper was presented at the Midwest Political Science Association 2012 Annual Meeting, Chicago, IL and the International Society for Political Psychology 2012 Annual Meeting, Chicago, IL. Thanks are due to John Brehm, Jamie Druckman, Samara Klar, Jeff Mondak, Marco Steenbergen, and the editorial team at JEPS for feedback on various stages of the project, and to Jamie, Fay Lomax Cook, and Toby Bolsen for sharing survey data from the $t1$ panel wave.}\\
Department of Government\\
London School of Economics and Political Science\\
\href{mailto:thosjleeper@gmail.com}{thosjleeper@gmail.com}
}


\begin{document}

\maketitle

{\abstract\noindent Ecological validity is vital to experimental research because designs that are too artificial may not speak to any real-world political phenomenon. One such concern is treatment self-selection: if individuals in the real world self-select treatments, such as political communications, how well does the sample average treatment effect estimate the effects of message exposure for those individuals who would --- if given the choice --- opt-in to and out of receiving treatment? This study shows that randomization masks effect heterogeneity between individuals who would select different messages if given the choice. Yet such selections are themselves complex, revealing additional challenges for realistically studying treatments prone to self-selection. The evidence of effect heterogeneity raises questions about the appropriateness of random assignment experiments for studying political communication and the results more broadly advance our understanding of citizens' selection into and responses to communications when, as they often do, have choice over what messages to receive.}

\doublespacing
\clearpage

Experiments are seen as a gold standard because randomized assignment to a treatment provides unparalleled internal validity, ensuring that differences in outcomes between groups are due to the treatment alone. While experiments are sometimes criticized for lacking external validity \citep[see][]{McDermott2011, ShadishCookCampbell2001}, another important but less studied concern is the notion of ecological validity: i.e., the ``realism'' of the experiment. Ecological validity is vital to experimental design because studies that involve too stylized an experience for participants may not speak to any real world political phenomenon of interest.

One recent concern about ecological validity relates to the notion of treatment self-selection \citep{LauRedlawsk2006, GainesKuklinski2011a, GainesKuklinski2011b, DruckmanFeinLeeper2012, ArceneauxJohnson2012}. If in the real world individuals self-select their own ``treatments'' then randomized exposure may not constitute a realistic exploration of that phenomenon. Given this natural self-selection, how well does the experimentally identified treatment effect estimate the effect of message exposure for those individuals who would --- if given the choice --- opt into or out of receiving it? This is particularly important in the study of political communication, where real-world exposure to messages is often self-selected. 

To answer this, the present research uses a population-based survey experiment to test how individuals respond to policy arguments are either randomly assigned or self-selected by participants. The findings show that treatment randomization masks effect heterogeneity across individuals inclined to select alternative messages. Furthermore, a pretreatment manipulation of attitude importance in the experiment modified participants' choice of treatments and led to variations in apparent treatment effects, suggesting that the inferences from choice-based experiments hinge on which respondents self-select each treatment. The results speak to research on selective exposure, motivated reasoning, and political communication, and more generally to the use of experiments for studying treatments characterized by real-world self-selection.


\section*{Studying the Effects of Self-Selected\\Political Communications}\label{sec:Review}

Arguments conveyed by media and political elites are widely seen as an important source of political information that citizens can use when forming preferences about politics \citep{Disch2011, ChongDruckman2007a}. Much of the evidence for the influence of elite communications comes from randomized experiments that expose participants to different messages and then measure information effects on outcomes such argument evaluations, opinions, issue importance, and information-seeking \citep[see, for example,][]{Ansolabehereetal1994, ArceneauxJohnson2012, BerinskyKinder2008, Brewer2003, BrewerGross2005, IyengarKinder1987, Levendusky2013a, MillerKrosnick2000, NelsonClawsonOxley1997, PettyCacioppo1986, TichenorDonohueOlien1970}. Questions have been raised, however, about the extent to which such experiments provide an adequate empirical test of real-world information exposure given that, in \citeauthor{Hovland1959}'s words: ``In an experiment the audience on whom the effects are being evaluated is one which is fully exposed to the communication. On the other hand, in naturalistic situations with which surveys are typically concerned, the outstanding phenomenon is the limitation of the audience to those who expose themselves to the communication'' (\citealt{Hovland1959}, 9; \citealt{BennettIyengar2008}, 724). For example, \citet{AlbertsonLawrence2009} use a field experimental design to study the effects of watching a television news program, but had to address the fact that some people encouraged to watch the program chose not to watch it and some who were not encouraged watched it anyway. Their solution is to use the randomized encouragement as an instrument for viewing the program and identify the effect of the program on compliers (i.e., the effect for the unobservable subset of individuals who only watch the program if compelled to do so). In cases such as this, it is unclear what the effect of the treatment would have been had it been applied to everyone (a consequence of encouragement non-compliance), nor the effect for those who chose to view the program or the effect for those who chose not to view the program (in both cases because the CATE is an effect for those whose treatment uptake can be manipulated). In the face of practical challenges such as these, much experimental research continues to focus on the sample average treatment effect, which averages the effects for the two subgroups that choose to be treated and choose to be untreated, respectively \citep[but see][]{Levendusky2012, ArceneauxJohnson2012}.

Aside from the sample average treatment effect, Hovland encourages researchers to focus on these separate effects for ``those who expose themselves to the communication'' and those who do not. Why would we care about the treatment effect for these two groups? Consider the classic experiment by \citet{NelsonClawsonOxley1997} in which laboratory participants were randomly assigned to watch a news story framing a hate rally in terms of either free speech or public order. While our interest may lie in knowing the effect of a hypothetical intervention in which all citizens receive the free speech news story, our interest in understanding citizens' interactions with media suggests that we also (and perhaps rather) want to know how that message affects different segments of the public. How are those who opt-in to free speech framed news affected by it? How are those who would rather opt-in to public order news affected by free speech news? The answers to these questions matter because much exposure is selective in this way. Indeed, despite early skepticism \citep{SearsFreedman1967}, it is now a relatively uncontroversial claim that citizens selectively expose themselves to political information \citep{BennettIyengar2008, BolsenLeeper2013, Stroud2011, Kim2007, Kim2009, IyengarHahn2009, Iyengaretal2008, Garrett2009a, Garrett2009b, GarrettCarnahanLynch2013, Feldman2011, SmithFabrigarNorris2008} and a recent meta-analysis suggests that \emph{political} selective exposure is especially prevalent \citep{Hartetal2009}. While individuals select messages based on prior attitudes, they also appear to engage in selective exposure according to ideology, habit, and topical interests \citep{BennettIyengar2008, Prior2007, IyengarHahn2009, Baum2002}. Given this selectivity, we should aim to understand how individuals are affected by treatments they actually encounter in the real world.

In what ways might we expect the effect of a political communication to differ across individuals inclined and disinclined to select it? There are two general possibilities. One is \textit{homogeneous effects}: the effects of exposure to a political communication are the same for those who would choose to receive the message as for those who would choose an alternative message. In the case of such homogeneity, the SATE reflects the effects for both groups and there are no concerns about what can be learned from a randomized experiment.\footnote{Another possibility is that even in the face of effect heterogeneity across individuals, message self-selection is as-if random so the groups of message self-selectors and non-selectors are indistinguishable from on another on average.} The other possibility is \textit{heterogeneous effects}: the effects of exposure to a political communication are different for those who would choose and those who would not choose a given message.

Why might effect heterogeneity exist? One possibility is suggested by research on motivated reasoning \citep{Kunda1990, LodgeTaber2000, Nickerson1998, TaberLodge2006}. If an individual selects a message that reinforces their prior views (e.g., an individual supportive of a policy chooses a supportive argument), then exposure to a preferred message is likely to either have no effect due to ceiling constraints or to make that individual's opinion more extreme. By contrast, an individual for whom that message is dispreferred is likely to counterargue against this dispreffered message, such that exposure has no effect due to their motivated resistance to countervailing arguments or even has a backfire effect by which the individual becomes more negative in responses to a supportive message. Thus, homogeneous effects occur if messages are uniformly persuasive (which is unlikely given past evidence for polarization effects; \citealt{TaberLodge2006}) or if a message has no effect on anyone. Heterogeneous effects across these groups seem more likely given that null effects for one group paired with either positive or negative effects for the other group produce a pattern of heterogeneity. Some past research suggests that individuals are likely to see attitude-congruent arguments as stronger and more effective than incongruent arguments \citep{Dittoetal1998}, which means that the effects of political communications on opinions for message selectors and non-selectors are likely point in opposite directions. It is therefore reasonable to expect effect heterogeneity, which means the sample average treatment effect may not be representative of the effects of exposure to political communications for either those inclined to choose the treatment or the remainder of the population.

Given the theoretical grounding in motivated reasoning, it is also worth considering whether this expected pattern of effect heterogeneity holds when individuals are more or less inclined to engage in attitude-congruent selective exposure. Attitude importance is one mechanism thought to affect the degree of attitude-congruent selective exposure \citep{Holbrooketal2005, TaberLodge2006, Leeper2014}, so when individuals believe an issue to be very important they are more likely to choose congenial messages. By contrast, under low importance, individuals are thought to make more balance information choices. High attitude importance should also increase information-seeking in general, as has been demonstrated in prior work \citep{Krosnick1990c, Holbrooketal2005}. Therefore when issue opinions are felt to be personally important, it is reasonable to expect greater effect heterogeneity. When issue opinions are felt to be less important, the increased likely of attitude-incongruent selection means that the audiences for alternative political communications are more similar and the effects of exposure to a given message should be, on average, more similar.

\section*{Experimental Design}\label{sec:Design}

I use a hybrid experimental design --- combining randomized exposure and message self-selection --- to test for this expected pattern of effect heterogeneity across those selecting into one of two alternative arguments about a policy issue. In the hybrid design, one-half of participants participate in a randomized experiment and the other half participate in an observational study involving treatment self-selection \citep{GainesKuklinski2011b}. The present research builds on past research by employing a hybrid design with three additional features: (1) a pretreatment manipulation of attitude importance that is meant to modify the degree to which respondents choose treatment messages consistent with their prior attitudes, (2) a two-wave panel design that allows for a clean measure of prior opinions, which are a theorized cause of message selection, and (3) a nationally representative sample of adult participants.

I test for heterogeneous effects between message selectors and non-selectors in response to randomly assigned or self-selected information on the issue of ``renewable energy portfolio'' standards, which require electrical utilities to produce energy from renewable resources.\footnote{U.S. Department of Energy, 2003, ``Analysis of a 10-percent Renewable Portfolio Standard,'' Office of Integrated Analysis and Forecasting of the Energy Information Administration; Chen, Cliff, Ryan Wiser, and Mark Bolinger, 2007, ``Weighing the Costs and Benefits of State Renewables Portfolio Standards: A Comparative Analysis of State-Level Policy Impact Projections,'' Report prepared for the Permitting, Siting, and Analysis Division, Office of Electricity Delivery and Energy Reliability, U.S. Department of Energy.} %Given that the policy evokes environmental and pocketbook concerns, the importance of respondents' attitudes can be manipulated, thus enabling clear causal inferences about the effects of selective exposure. And information can be provided that will influence issue opinions, by emphasizing different considerations such as positive benefits for the environment, the role of government in energy production, and potential effects on consumer energy prices. The policy has no clear partisan sponsor and, while potentially salient for some citizens, is typical of most policies --- accessible but likely not a highly salient (or a priori) especially important issue. 
The survey took place in two-parts, with the first wave of data was collected in Summer 2010 (hereafter, $t1$) and involved collection of demographic covariates and baseline attitudes. The second, experimental wave was collected in Spring 2011 (hereafter, $t2$). The two-wave design is advantageous because it provides a clean measure of $t1$ opinion, avoiding accessibility or consistency biases into respondents' behavior during the $t2$ survey experiment and enables estimation of opinion effects using more precise within-subjects, pre-/post-treatment changes.\footnote{Given that the issue received almost no media coverage in the intervening months, there is no reason to believe that any subjects were substantially influenced between the earlier and present wave (an assumption that is testable by comparing present attitudes of those in the control condition against their responses in the earlier wave). On average, opinions became slightly more negative between the two waves.} %Individuals were coded as either \emph{opponents}, neutral, or \emph{supporters} of renewable energy portfolio standards based upon their $t1$ attitudes.


\begin{figure*}
\caption{Experimental Design and Treatment Group Sample Sizes}\label{fig:conditions}
\vspace{1em}
\tikzstyle{block} = [draw, text width=7.5em, text centered]
\tikzstyle{line} = [draw, >=latex']
\begin{tikzpicture}[scale=0.45, font=\sffamily\small, node distance=1cm]
    \node[block] (X) {Background\\Questionnaire ($t1$)};
    \node[block, above right=1.5 and 1 of X] (High) {High\\Importance\\($t2$)};
        \node[block, above right=0.25 and 1 of High] (HiChoice) {Choice};
            \node[block, above right=0 and 0.5 of HiChoice] (HiChoicePro) {Chose Pro (204)};
            \node[block, below right=0 and 0.5 of HiChoice] (HiChoiceCon) {Chose Con (67)};
            \path [line] (HiChoice.east) -- (HiChoicePro.west);
            \path [line] (HiChoice.east) -- (HiChoiceCon.west);
        
        \node[block, below right=0.25 and 1 of High] (HiCaptive) {Captive};
            \node[block, above right=0 and 0.5 of HiCaptive] (HiCaptivePro) {Pro (66)};
            \node[block, below right=0 and 0.5 of HiCaptive] (HiCaptiveCon) {Con (68)};
            \path [line] (HiCaptive.east) -- (HiCaptivePro.west);
            \path [line] (HiCaptive.east) -- (HiCaptiveCon.west);

        \path [line] (High.east) -- (HiChoice.west);
        \path [line] (High.east) -- (HiCaptive.west);
    
    \node[block, right=9.25 of X, color=gray] (Control) {Control (66)};
    
    \node[block, below right=1.5 and 1 of X] (Low) {Low\\Importance\\($t2$)};
        \node[block, above right=0.25 and 1 of Low] (LoChoice) {Choice};
            \node[block, above right=0 and 0.5 of LoChoice] (LoChoicePro) {Chose Pro (173)};
            \node[block, below right=0 and 0.5 of LoChoice] (LoChoiceCon) {Chose Con (101)};
            \path [line] (LoChoice.east) -- (LoChoicePro.west);
            \path [line] (LoChoice.east) -- (LoChoiceCon.west);
        
        \node[block, below right=0.25 and 1 of Low] (LoCaptive) {Captive};
            \node[block, above right=0 and 0.5 of LoCaptive] (LoCaptivePro) {Pro (67)};
            \node[block, below right=0 and 0.5 of LoCaptive] (LoCaptiveCon) {Con (67)};
            \path [line] (LoCaptive.east) -- (LoCaptivePro.west);
            \path [line] (LoCaptive.east) -- (LoCaptiveCon.west);

        \path [line] (Low.east) -- (LoChoice.west);
        \path [line] (Low.east) -- (LoCaptive.west);
    
    
    \path [line] (X.east) -- (High.west);
    \path [line] (X.east) -- (Low.west);
    \path [line, color=gray] (X.east) -- (Control.west);
    
	\node[below right=0 and .2 of HiChoicePro] (HiYChoice) {$\bar{Y}_{Choice}$};
	\draw [decorate, decoration={brace,amplitude=7pt}]
		(HiChoicePro.north east) -- (HiChoiceCon.south east);
	\node[right=.2 of HiCaptivePro] (HiYPro) {$\bar{Y}_{Pro}$};
	\draw [decorate, decoration={brace,amplitude=5pt}]
		(HiCaptivePro.north east) -- (HiCaptivePro.south east);
	\node[right=.2 of HiCaptiveCon] (HiYCon) {$\bar{Y}_{Con}$};
	\draw [decorate, decoration={brace,amplitude=5pt}]
		(HiCaptiveCon.north east) -- (HiCaptiveCon.south east);
	
	\node[below right=0 and .2 of LoChoicePro] (HiYChoice) {$\bar{Y}_{Choice}$};
	\draw [decorate, decoration={brace,amplitude=7pt}]
		(LoChoicePro.north east) -- (LoChoiceCon.south east);
	\node[right=.2 of LoCaptivePro] (LoYPro) {$\bar{Y}_{Pro}$};
	\draw [decorate, decoration={brace,amplitude=5pt}]
		(LoCaptivePro.north east) -- (LoCaptivePro.south east);
    \node[right=.2 of LoCaptiveCon] (LoYCon) {$\bar{Y}_{Con}$};
	\draw [decorate, decoration={brace,amplitude=5pt}]
		(LoCaptiveCon.north east) -- (LoCaptiveCon.south east);
    
\end{tikzpicture}
{\tablenote Differences in sample sizes within the captive conditions reflect random assignment. Differences in sample sizes between ``Chose Pro'' and ``Chose Con'' conditions reflect treatment self-selection. Choice conditions were intentionally oversampled. The $\bar{Y}$ values at right are meant to clarify what groups are used for estimating treatment effects.}
\end{figure*}


\subsection*{Manipulations}

The $t2$ experiment involved three manipulations: a manipulation of attitude importance, the direction of the energy policy argument, and whether that argument was self-selected by respondents or captively (i.e., randomly) received. Figure \ref{fig:conditions} provides a visual summary of the design, along with treatment group sample sizes and notation used in defining causal effects. The first manipulation involves modifying the apparent personal impact of the energy proposal. This serves as an instrument for respondents' information choices (which would otherwise be nonrandomly determined) and provides insight into the extent to which self-selection behavior varies across contexts. The logic of the manipulation was that individuals who believe their self-interest is at-stake would display higher attitude importance and thus be more likely to choose a treatment message (if given the choice) that was congruent with their prior attitudes \citep{Krosnick1989, BoningerKrosnickBerent1995}. Importance was manipulated to be high by telling respondents:

\begin{quote}
A new law is currently moving through Congress that would require your electricity provider to purchase energy from renewable sources (e.g. wind and solar). This is relevant to you since it will influence your energy bills and the environment. The law would go into effect immediately.
\end{quote}

\noindent Those in low importance condition read that:

\begin{quote}
Some have proposed a bill that would require electricity providers to purchase energy from renewable sources (e.g. wind and solar). This is probably not directly relevant to you because Congress does not appear to be ready to act on the bill and even if they did it is unlikely to personally affect you.
\end{quote}

\noindent A post-treatment manipulation check asked ``How important to you personally is your opinion about this renewable energy restriction?'' The results of the manipulation check confirm that attitude importance was manipulated. On a 0-1 scale, those in high importance conditions rated their attitude importance at 0.77 and those in low importance conditions rated their attitude importance at 0.69, a statistically significant difference ($p=0.00$). The mean attitude importance reported by a control condition that received no manipulation and no information was 0.72. There was also no indication that the manipulation of importance affected information evaluations ($p=0.15$), post-treatment opinions ($p=0.42$), or within-subject opinion changes ($p=0.34$).\footnote{\label{ftn:pretest}Three rounds of pilot tests were conducted with 80 participants each in order to determine the question and manipulation wordings. Participants were recruited from Amazon Mechanical Turk and compensated \$.20 each for their participation. In the final pretest, the high importance manipulation achieved a mean importance (on a 7-point scale) of 5.66 (SD=1.08), while the low importance manipulation achieved a mean importance of 5.2 (SD=1.44). This effect is small, but in the intended direction. It is also a larger effect than either of the previous pretests. The difficulty of manipulating attitude importance \citep{VisserHolbrookKrosnick2007} makes the relatively small apparent effects in the small-$n$ pretest seem reasonable.}

The second manipulation varied whether the information respondents read was supportive (Pro) or opposed (Con) to the policy. The Pro message was entitled ``Renewable Energy Rules Beneficial'' and the Con message was entitled ``Renewable Energy Rules Ineffective.'' Exact text of the messages is available in Appendix A. An effort was made to ensure that the informational content of the Pro and Con messages was near-identical. The difference between the two treatments is in the language chosen to describe the same basic facts. Pilot testing confirmed that messages were equally effective. The Pro message was rated as 5.08 (SD=1.38) on a 7-point effectiveness scale, while the Con message was rated 4.97 (SD=1.47) on the same scale. And, the Pro message produced attitudes (on a 7-point scale) of 5.63 (SD=1.64) and the Con message produced attitudes at 4.5 (SD=1.93), a difference that is in the intended direction. %The information treatments were coded as congruent (+1) if a supporter received Pro information or if an opponent received Con information; similarly choices were coded as incongruent (-1) if the opposite exposures occurred.\footnote{Those with neutral $t1$ attitudes are isolated from certain analyses as a result of this coding.}

The final manipulation involved how the informational treatments were assigned to respondents. One-third of the respondents were assigned to ``captive exposure'' conditions that involved simply reading either the Pro or Con argument (based on random assignment). The other two-thirds were presented with the headlines for each passage and told to choose one of them to read, which they were then given, directly emulating the process of selective exposure.\footnote{A control group exposed to no treatment was also included but is not analyzed here.}

\subsection*{Measures}

After receiving the importance and informational manipulations, outcome measures were collected including evaluations of the information received, attitude toward the policy, subjective intentions to obtain further information, and a behavioral measure directly tapping willingness to receive additional information in the form of an email message.\footnote{While the primary interest is in evaluations and opinions, the information-seeking measures are thought to be affected by heightened attitude importance and may be further shaped by exposure to issue arguments.} All variables are coded to range from 0 to 1, with higher scores indicating more positive evaluations, higher policy support, or greater intention to seek information. The opinion question read, ``Thinking about energy related restrictions, to what extent do you oppose or support requiring electricity providers to purchase energy generated from renewable sources (e.g., wind, solar)?'' and solicited responses on a seven-point scale from ``strongly opposed'' to ``strongly support.'' As already mentioned, this item was also measured on the $t1$ survey, enabling estimation of treatment effects for opinion using both post-treatment and pre/post changes.

The argument evaluation question read, ``How effective would you say the information you read was in making an argument about this energy-related restriction?'' The subjective information-seeking question asked ``How likely are you to seek more information about renewable energy requirements?'' The behavioral measure asked ``Can we send you an email with more information about renewable energy requirements?'' Responses to the latter measure were coded as 1 if the respondent entered their email address and 0 otherwise. The two information-seeking measures correlate to some extent ($r=0.44$). Treatment group means for all outcome measures are reported in Appendix B.

\subsection*{Estimation Strategy}

Following from \citet{GainesKuklinski2011b}, I estimate three different treatment effects for each outcome variable. The first is the familiar sample average treatment effect (SATE):

\begin{equation}
SATE = \bar{Y}_{Pro} - \bar{Y}_{Con}
\end{equation}
where $\bar{Y}_{Pro}$ is the mean outcome value among those captively assigned to the Pro message and $\bar{Y}_{Con}$ is the mean outcome value among those captively assigned to the Con message.

As \citeauthor{GainesKuklinski2011b} point out, this SATE is a weighted average of effects of the Pro (versus Con) message for different observable subsamples: one for those who would choose the Pro message and one for those who would choose the Con message \textit{if given the choice}.\footnote{Because respondents in choice conditions were only allowed a choice between Pro and Con messages, it is necessary to focus on the difference between receiving these two messages as opposed to a treatment-control comparison.} The second effect is therefore the effect of the treatment on those who would choose it (the \textit{Pro-Selector Effect}, hereafter the PSE):

\begin{equation}
{PSE} = \dfrac{\bar{Y}_{Choice} - \bar{Y}_{Con}}{\hat{\alpha}}
\end{equation}
where $\bar{Y}_{Choice}$ is the mean outcome value among all respondents assigned to the ``choice'' condition (see Figure \ref{fig:conditions}) and $\hat{\alpha}$ is the proportion of these individuals choosing the Pro message.

The third effect captures the effect of receiving the Pro rather than Con message among those who would not choose the Pro message (i.e., the \textit{Con-Selector Effect}, hereafter the CSE):

\begin{equation}
{CSE} = \dfrac{\bar{Y}_{Pro} - \bar{Y}_{Choice}}{1-\hat{\alpha}}
\end{equation}
In other words, the PSE represents the effect of the Pro message treatment versus the Con message for those who would opt for the Pro message \textit{if given the opportunity} and the CSE represents the effect of the Pro message for those who would opt for the Con message \textit{if given the opportunity}. The PSE and CSE are thus two average effects for different subsets of the population. These effects are causally identified given random assignment to the choice and captive arms of the experiment, which makes individuals in each arm identical in expectation. A concern with this strategy would be if being given a treatment choice per se (regardless of treatment chosen) affects the outcome (an exclusion restriction), but is not something that is easy to test empirically. 

Effect homogeneity occurs when the PSE and CSE are similar, and are in turn similar to the SATE. Effect heterogeneity occurs when these quantities diverge. If homogeneous, there is little reason to be concerned about the ecological validity of a captive exposure experiment or the inferences that can be drawn from that experiment to different segments of the population. Without such heterogeneity, the SATE is a reasonable estimate of treatment effects for Pro selectors and Con selectors.

\subsection*{Respondents}

Data were collected by Bovitz Research Group of Encino, CA, who provide an online panel of approximately one million respondents recruited through random digit dialing and empanelment of those with internet access. As with most internet survey samples, respondents participate in multiple surveys over time and receive compensation for their participation. A total of 879 respondents completed both waves and analysis is restricted to these respondents. A total of 885 respondents completed the first wave, suggesting there is little concern about sample attrition. The sample was drawn to demographically match the U.S. adult population and data are analyzed without weighting. Respondents had a median age of 49, and were 49.0\% female, 75.0\% white, and 98.9\% had at least high school degrees and 80.9\% had university degrees. The partisan composition was 39.2\% Democrats and 30.7\% Republicans.


\section*{Results}\label{sec:results}

I begin by examining the information choices made by respondents in the choice conditions. To start, it is clear that the manipulation of attitude importance was effective: 88\% of high importance respondents chose information congruent with their $t1$ opinion, while only 63\% of low importance respondents chose information congruent with their $t1$ opinion (a statistically significant difference).\footnote{Recall that 50\% choosing congruent information would be considered ``random'' choice.} This means the odds of a high importance respondent choosing the message congruent with their $t1$ opinion was roughly 7:1, while the odds for a low importance subject were only about 1.5:1, yielding a odds-ratio of 4.25, which is a dramatic and significant effect.\footnote{The odds-ratio would be equal to one if there was no effect of importance. In this case, the 95\% confidence interval for the odds-ratio is (2.68,6.75), suggesting a large effect of attitude importance on attitude-congruent selective exposure. This result also holds when looking at supporters and opponents separately: among those with high importance attitudes, 88\% of supporters chose the Pro information and 85\% of opponents chose the Con information. By contrast, under low importance, only 65\% of supporters chose the Pro information and a mere 35\% of opponents chose the Con information. Given that all respondents were presented with the two headlines in the same order --- the Pro headline coming first --- it is possible that the slight preference for the Pro information among those with low importance is attributable to primacy effects. It is also possible that low importance respondents might have opted to read nothing if they had been given the choice, meaning that lacking importance people disengage from political information entirely. Furthermore, given that respondents' opinions leaned positive ($\bar{x}_{t1}$=0.76, SD=0.01), breaking out the results in this way confirms that the aggregate rates of attitude-congruent selective exposure are not simply a result of the sample disproportionately choosing the Pro message. As a comparison, of those with neutral $t1$ opinions, 49\% chose the Con message and 51\% chose the Pro message, suggesting their choices were relatively arbitrary and confirming the pretest finding that neither the Pro headline or Con headline was more enticing to read.} These variations in attitude-congruent selective exposure are consistent with expectations. As such, it is reasonable to expect that if there are heterogeneous effects of exposure to a Pro message among selectors and non-selectors, the pattern of that heterogeneity is likely to differ across individuals in the high and low importance groups.

Table \ref{tab:results} presents the main results, separately for the full sample in panel (a), the low importance condition in panel (b), and the high importance condition in panel (c). Because importance affected the degree of attitude-congruent selective exposure, separately estimating the effects in this way makes it possible to detect further effect heterogeneity when the pattern of treatment self-selection changes. This possible source of variation in selectivity has important methodological consequences because past work has implicitly assumed that selectors are \textit{fixed types} who would always choose the same way. That assumption has not previously been tested.

Looking at panel (a), we see sample average treatment effects (SATEs) estimated from the captive conditions for each of the five outcome measures (argument evaluation, opinion level, opinion change, planned information-seeking, and requests for an email with issue-relevant information), alongside the estimated PSEs and CSEs. In substantive terms, these five SATEs indicate that, on average across the whole sample: (1) the Pro argument is seen as more effective than the Con argument, which makes sense given the supportive attitudes of the sample, (2) exposure to the Pro message increases support for the policy, (3) this increase in support holds when measured by $t2-t1$ changes in opinion rather than $t2$ levels of opinion, (4) the Pro message insignificantly reduces self-reported plans to seek out issue-relevant information, and (5) the Pro message reduces the likelihood of requesting an email with more policy information.

\begin{table*}
\caption{Treatment Effect Estimates, by Importance Condition}\label{tab:results}
\begin{center}
\input{tables/results1.tex}
\vspace{0.5em}\\
(a) Full sample\\
\vspace{2em}

\input{tables/results2.tex}
\vspace{0.5em}\\
(b) Low Importance Condition\\
\vspace{2em}

\input{tables/results3.tex}
\vspace{0.5em}\\
(c) High Importance Condition
\end{center}
\tablenote{Note: Cell entries are estimated treatment effects, with bootstrapped standard errors in parentheses. The three estimated values of $\hat{\alpha}$ used in estimating the PSEs and CSEs are: 0.69 (full sample), 0.63 (low importance), and 0.75 (high importance).}
\end{table*}

The PSEs paint a largely similar story, with differences only in the magnitude but not direction of effects. The story is different for CSEs: while the Pro message still increases policy support for these individuals (as measured by changes over time), there are no other substantively sizable effects on exposure and none of the CSEs are statistically distinguishable from zero. In other words, there appears to be effect heterogeneity: those inclined to select the Pro message are affected in various ways, while there appears to be no effect for those who select the Con message.

We can now turn to panel (b) of Table \ref{tab:results}, which displays the low importance condition. Recall that low importance diminished attitude-congruent selective exposure, such that the groups represented by the PSE and CSE are more similar to one another here than in the high importance condition, where the audiences for the two messages were more divided along attitudinal lines. The SATEs in this condition are very similar to those for the sample as a whole: the Pro message is seen as more effective, it increases policy support as measured by both $t2$ opinions and opinion changes over time, and decreases the rate of requesting a follow-up email, while there is no effect on intended information-seeking. Turning to the PSEs in column 2, we see a pattern generally consistent with the SATEs in effect direction but only one effect is statistically distinguishable from zero: exposure to the Pro message leads to a substantial decrease in requests for the follow-up email. The CSEs in the third column are also consistent with the SATEs in direction but only the effect on opinion changes (wherein exposure to the Pro message increases support) is statistically distinguishable from zero. Under low importance, there appears to be relatively little effect heterogeneity: effects for Pro-selectors and Con-selectors are fairly consistent with one another, meaning the SATEs provide inferences that seem to apply equally well to those preferring and not preferring the treatment.

Finally, panel (c) of Table \ref{tab:results} shows results for the high importance condition. Recall that under high importance, respondents were much more likely to self-select a treatment message congruent with their prior ($t1$) opinion than in the low importance condition (the odds-ratio was 4.25). The consequence of this for inferences about the effects of the treatment messages should be immediately clear: while the SATEs in this condition mirror those for the low importance condition and the sample as a whole (with the exception of the email message outcome), the PSEs and CSEs differ considerably. Under high importance, there is a very large positive PSE on argument evaluation (meaning the Pro message was seen much more favorably than the Con message) and opinions were moved nearly 20\% more supportive among Pro-selectors. The PSEs for the two information-seeking measures were not distinguishable from zero.

Looking at the CSEs in column 3, the effects on argument evaluation and opinion now are substantively negative, which indicates that the Pro message was seen as less effective than the Con message by these individuals and actually produced a backlash that made these respondents less supportive of the policy, though the large standard errors mean these effects are not statistically distinguishable from zero. Looking at the information-seeking measures, there is a substantively large positive effect of the Pro message on email requests but again the large standard errors mean this is difficult to distinguish from statistical noise. The high importance conditions therefore suggest a quite different pattern of causal effects than the low importance conditions, in both size and direction of effects for the Pro-selectors and Con-selectors.

These results are striking. As expected, the subsample effects vary widely between the high and low importance conditions. If one were to use these data to make inferences about the effects of exposure to a political argument, those inferences would differ substantially depending whether one looked only at the randomized SATEs or at effects derived from the hybrid design, and those effects would differ even further depending on the level of attitude-congruent selective exposure driving the participants' choices of message. Thus relatively simple experimental design decisions could lead us to infer that the difference between receiving the Pro and Con messages on opinions was any of the following:

\begin{itemize*}
\item increased policy support (full sample, high importance, and low importance SATEs)
\item increased support only among those inclined to receive the message (full sample or high importance PSEs)
\item no effect on those inclined to receive it (low importance PSE)
\item no effect on those disinclined to receive it (full sample, high importance, and low importance CSEs)
\item possible backfire effect on those disinclined to receive it (high importance CSE)
\end{itemize*}

\noindent If we were interested in the hypothetical, universal application of the Pro rather than Con message to all members of the population, the SATE would tell us that such an intervention would increase policy support. If we were instead interested to know how receiving the Pro rather than Con message affected those who chose the Pro message --- as we would be, if we wanted to understand communication effects as they might actually occur in the real world --- then our inference about that effect depends on whether the issue is seen as personally important or not. If it is personally important, then those choosing the Pro message would be made more supportive by that exposure; if the issue were less important, then the effect among Pro selectors would likely be smaller (because that group would consist of a more attitudinally diverse set of the public). The results strongly suggest effect heterogeneity: those preferring a message respond to it differently from those who disprefer it.\footnote{Similarly interesting is the fact that the SATEs are very similar for the high and low importance groups despite the underlying group-specific effects differing considerably. This means that the randomized experiment offers a very similar inference about the effects of exposure to a political communication even when the underlying group-specific treatment effects for individuals who prefer the Pro and Con messages are quite different. While social scientists frequently acknowledge that the SATE is necessarily an average of underlying individual-level effects and may not represent the effect for any given unit or group of units, these results show that the SATE can be quite misleading.}


\section*{Discussion}\label{sec:discussion}

What do political communication researchers want to know? Is it what would happen if everyone was exposed to a political message? Or, is it instead what effect a message has on individuals who are exposed to it? Arguably it could be both, but while we know quite a bit about the former due to increasing use of randomized experiments to study political phenomena, we know surprisingly little about the effects political communications would have on those who do and do not receive them in the real-world. The present research shows that the SATE masks substantial effect heterogeneity among those that would be inclined to select distinct messages (i.e., self-select into the various treatment alternatives).

How does this change our understanding of political communications? The results showing heterogeneous effects bolster the argument --- made by researchers dating as far back as \citet{Hovland1959} --- that we should try to understand how messages affect those that would actually receive the message in the real-world rather than just study the hypothetical effect of communicating the message to everyone, regardless of their information preferences. While the SATE might lead us to believe that a message increases policy support thereby inviting a policy implication of expanding provision of the message, that average effect can mask a reality in which those inclined to receive are affected while those disinclined are not even when they are forced to receive it (as was the case here). Or the treatment might have no effect on those who prefer it, while affecting only those who would never choose it of their own accord. The SATE can mislead us about who is affected and how much, and therefore misguide our understanding of the role of communications on the public. If an experiment shows a treatment effect but it actually occurs only for those who would never choose it, what have we learned about political communication? Taking the leap from SATE to practical implications without acknowledging this heterogeneity might be fundamentally problematic.

Yet even as the present research highlights the value of incorporating choice elements in an experiment, the results demonstrate that uncovering such heterogeneity is complicated by the complexity of individuals' choice behavior. While a self-selection experiment presents a face valid form of experimental realism, the design is no panacea for ecological validity concerns. Pre-choice manipulations --- like the importance manipulation used here -- may be useful for better understanding such dynamics; the use of alternative choice sets may also clarify the conditional nature of treatment choices \citep[see, for example,][]{Leeper2014}. While the findings suggest incorporating self-selection into a randomized experiment can be fruitful, more research is needed on how best to study self-selection processes.


\singlespacing
\bibliographystyle{apsa-leeper}
\bibliography{References}
\clearpage




\appendix \label{appendix}
\section*{Appendix A. Survey Material} \label{sec:Questionnaire1}
\singlespacing
\textbf{Attitude Importance Manipulation:}

\noindent \emph{High:} A new law is currently moving through Congress that would require your electricity provider to purchase energy from renewable sources (e.g. wind and solar). This is relevant to you since it will influence your energy bills and the environment. The law would go into effect immediately.

\noindent \emph{Low:} A few legislators in Congress have proposed a bill that would require electricity providers to purchase energy from renewable sources (e.g. wind and solar). This is probably not directly relevant to you because Congress does not appear to be ready to act on the bill and even if they did it is unlikely to personally affect you.\\

\noindent \textbf{Information Choice Manipulation:}

\noindent \emph{Assigned:} [Randomly assigned to Pro or Con]\\

\noindent \emph{Choice:}
\noindent Before you proceed, please choose one of the following brief passages to read:

``Renewable Energy Rules Beneficial''

``Renewable Energy Rules Ineffective''\\

%\noindent [PAGE BREAK]\\

\noindent \textbf{Article Text:}

\noindent \emph{Pro:} ``Renewable Energy Rules Beneficial''\\The proposed federal law would create uniform nationwide standards which is necessary since many states have not adopted renewable energy provisions. The new standards would require electricity utilities to produce between 10\% and 30\% of their energy from renewable sources especially wind power as well as potentially innovative new sources of energy. This in turn would reduce pollution. The impact on consumers is also affordable: adopting a nationwide standard would increase monthly electricity bills by only about 1\%. Renewable standards therefore reduce reliance on fossil fuels for energy production without dramatically increasing costs to American consumers. \\

\noindent \emph{Con:} ``Renewable Energy Rules Ineffective''\\The proposed federal law would intervene in state policies and regulate private businesses to create uniform nationwide standards, where up until now many states have not adopted renewable energy provisions. The new standards would drive down innovation by requiring energy utilities to adopt specific technologies (e.g. wind power) rather than directly targeting the reduction of polluting greenhouse gas emissions. The impact on consumers is also problematic: adopting a nationwide standard would increase monthly electricity bills by up as much as 4\%. Renewable standards therefore increase the cost of energy through government regulation without directly addressing potential environmental impacts of energy production from fossil fuels.  \\

\section*{Appendix B. Supplementary Descriptive Statistics}
\singlespacing

\begin{center}
\textbf{Table B1a. Argument Evaluations, by Treatment Condition}\\
\input{tables/app-evaltab1.tex}
\end{center}
Note: Cell entries are treatment group means with standard errors in parentheses and higher values indicating greater perceived argument effectiveness (scaled 0-1).

\begin{center}
\textbf{Table B1b. Argument Evaluations, by Importance and Choice Condition}\\
\input{tables/app-evaltab2.tex}
\end{center}
Note: Cell entries are treatment group means with standard errors in parentheses and higher values indicating greater perceived argument effectiveness (scaled 0-1).

\clearpage

\begin{center}
\textbf{Figure B1. Mean Argument Evaluations, by Importance and Choice Condition and Congruence with $t1$ Opinion}\\
\includegraphics[width=\textwidth]{figures/evallevels.pdf}
\end{center}
Note: Figure displays mean argument evaluations (and bars representing one and two standard errors of the mean) by importance and choice Condition, separately for arguments congruent or incongruent with respondents' $t1$ opinions.\\

\begin{center}
\textbf{Figure B2. Argument Evaluation Bias, by Importance and Choice Condition}\\
\includegraphics[width=\textwidth]{figures/evalbias.pdf}
\end{center}
Note: Figure displays the bias toward seeing attitude-congruent messages as more effective than attitude-incongruent messages. Points represent difference-in-differences estimates along with bars representing one and two associated standard errors, based on the mean argument ratings displayed in Figure C1.\\

\clearpage

\begin{center}
\textbf{Table B2a. $t1$ and $t2$ Opinions, by Treatment Condition}\\
\input{tables/app-opiniontab1.tex}
\end{center}
Note: Cell entries are treatment group means with standard errors in parentheses and higher values indicating greater support (scaled 0-1).\\

\begin{center}
\textbf{Table B2b. $t1$ and $t2$ Opinions, by Importance and Choice Condition}\\
\input{tables/app-opiniontab2.tex}
\end{center}
Note: Cell entries are treatment group means with standard errors in parentheses and higher values indicating greater support (scaled 0-1).\\

\clearpage


\begin{center}
\textbf{Table B3a. Information Seeking, by Treatment Condition}\\
\input{tables/app-infotab1.tex}
\end{center}
Note: Cell entries are treatment group means with standard errors in parentheses and higher values indicating greater information (scaled 0-1) and likelihood of requesting email (0/1).\\
\begin{center}

\textbf{Table B3b. Information Seeking, by Importance and Choice Condition}\\
\input{tables/app-infotab2.tex}
\end{center}
Note: Cell entries are treatment group means with standard errors in parentheses and higher values indicating greater information (scaled 0-1) and likelihood of requesting email (0/1).\\


\end{document}
